Spatial Point Patterns Analysis of Airbnb Listing in Singapore
To investigate if the distribution of Airbnb listings are affected by location factors such as near to existing hotels, MRT services and tourist attractions
Analyse the impact of COVID-19 on Airbnb business in Singapore by comparing Airbnb listings data on June 2019 and June 2021
In this analysis, the following dataset were used:
| Data | Description | Format | Source |
|---|---|---|---|
| Airbnb listings for June 2019 and June 2021 | Airbnb listings information including room type an location | CSV | Airbnb Listing |
| Hotels, tourist attractions | Locations of hotels and tourist attractions extracted using OneMap API | CSV | OneMap API |
| MRT Stations | Point representation to indicate the location of the MRT station. | SHP | LTA Datamall |
| Singapore Costal Outline | Coastal outline shp file from Data.gov.sg | SHP | Data.gov.sg |
In this analysis, CHANGE THIS 8 packages will be used, they are:
sf, a relatively new R package specially designed to import, manage and process vector-based geospatial data in R.
spatstat, which has a wide range of useful functions for point pattern analysis. In this hands-on exercise, it will be used to perform 1st- and 2nd-order spatial point patterns analysis and derive kernel density estimation (KDE) layer.
raster which reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster). In this hands-on exercise, it will be used to convert image output generate by spatstat into raster format.
maptools which provides a set of tools for manipulating geographic data. In this hands-on exercise, we mainly use it to convert Spatial objects into ppp format of spatstat.
tmap which provides functions for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API.
tidyverse contains a set of essential packages for data manipulation and exploration.
ggplot2 allows users to create static plots
plotly is used for creating interactive plots
OneMap SG API to query Singapore geospatial data from different themes
Use the code chunk below to install and launch the 8 R packages.
library(caret) library(dplyr) library(ggmap)
In this section, read_csv() of tidyverse package will be used to import the airbnb data sets into R.
We will name the airbnb listing data dated June 2021 as airbnb21 while the other data which is dated June 2019 will be named airbnb19
With information from Airbnb the value of the availability_365 column is based on the following:
The count of availability_365 of June 2019 is plot on the graph below.

A quick glance at the data suggest that there are about 1300+ listings with availability equals 0, it is highly unlikely for all of them to be fully booked.
The count of availability_365 of June 2021 is plot on the graph below.

Likewise, in 2021 there are 400 listings with availability equals 0
There is a high probability that most of those listings have host that are no longer planning to accept any guest, but host have yet to remove their listing. Hence, we will proceed to filter these listings out.
Next, we will plot a graph to find out the distribution of the number of reviews in airbnb19

There are about 2500+ listings with 0 reviews, reason they could be 0 could be:
Again, we will plot a graph to find out the distribution of the number of reviews in airbnb21

We will then remove listings with number of reviews equals 0 since they may be inactive listings.
Coordinate Reference System: NA
Coordinate Reference System: NA
A quick CRS check shows that there are no Coordinate Reference System assigned yet.
# A tibble: 0 x 16
# ... with 16 variables: id <dbl>, name <chr>, host_id <dbl>,
# host_name <chr>, neighbourhood_group <chr>, neighbourhood <chr>,
# latitude <dbl>, longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>,
# last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>
# A tibble: 75 x 16
id name host_id host_name neighbourhood_g~ neighbourhood
<dbl> <chr> <dbl> <chr> <chr> <chr>
1 2838555 1 Bedroom~ 1.45e7 <NA> Central Region Geylang
2 2840554 Spacious ~ 1.45e7 <NA> Central Region Geylang
3 3140972 Ground fl~ 1.45e7 <NA> Central Region Geylang
4 3144267 Furnished~ 1.45e7 <NA> Central Region Geylang
5 3258894 Nicely Fu~ 1.45e7 <NA> Central Region Geylang
6 3617090 Cosy 1 Be~ 1.45e7 <NA> Central Region Geylang
7 3671451 Nice and ~ 1.45e7 <NA> Central Region Geylang
8 3752662 Nice 1 Be~ 1.45e7 <NA> Central Region Geylang
9 3790364 Spacious ~ 1.45e7 <NA> Central Region Geylang
10 3859180 Ground fl~ 1.45e7 <NA> Central Region Geylang
# ... with 65 more rows, and 10 more variables: latitude <dbl>,
# longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>,
# last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>
From the output, it appears that airbnb21 has no missing values while airbnb19 seems to be missing host_name. Hence, we will proceed to drop these values.
[1] 0
[1] 0
All geometries are valid. We can now plot airbnb19 and airbnb21 .
From the above plot, we see some suspicious listings in strange location e.g located in parks or in sensitive installation such as military camp
We will proceed to remove these listings by using filter().
Likewise,we will proceed to remove these listings by using filter().
The code chunk below uses as_Spatial() of sf package to convert the three geospatial data from simple feature data frame to sp’s Spatial* class.
spatstat requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object. We need to convert the Spatial classes* into Spatial object first.
The codes chunk below converts the Spatial* classes into generic sp objects.
Check the sp objects properties
class : SpatialPoints
features : 4275
extent : 7215.566, 43591.65, 25170.77, 47923.77 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
class : SpatialPoints
features : 2257
extent : 7406.989, 43337.89, 25330, 48127.23 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +datum=WGS84 +units=m +no_defs
Now, we will use as.ppp() function of spatstat to convert the spatial data into spatstat’s ppp object format.
We can check the duplication in a ppp object by using the code chunk below.
[1] TRUE
[1] TRUE
The code chunk below implements the jittering approach to remove duplicated points.
Next, we will:
use read_csv to read the hotel and tourist attraction data
use st_read to read the MRT station data
use readOGR to read the coastal outline data
OGR data source with driver: ESRI Shapefile
Source: "C:\lye-jia-wei\IS415_New_Blog\_posts\2021-09-21-take-home-exercise-2\data\aspatial", layer: "CostalOutline"
with 60 features
It has 4 fields
Reading layer `MRTLRTStnPtt' from data source
`C:\lye-jia-wei\IS415_New_Blog\_posts\2021-09-21-take-home-exercise-2\data\aspatial\MRTLRTStnPtt.shp'
using driver `ESRI Shapefile'
Simple feature collection with 185 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
When analysing spatial point patterns, it is a good practice to confine the analysis with a geographical area like Singapore boundary. In spatstat, an object called owin is specially designed to represent this polygonal region.
The code chunk below is used to covert sg SpatialPolygon object into owin object of spatstat.

In the code chunk below, rescale() is used to covert the unit of measurement from meter to kilometer.
Kernel Density Map of Airbnb (2019)

Kernel Density Map of Tourist Attractions

Kernel Density Map of Hotels

Kernel Density Map of MRT Stations

The result is the same, we just convert it so that it is suitable for mapping purposes
Next, we will convert the gridded kernal density objects into RasterLayer object by using raster() of raster package.
The code chunk below will be used to include the CRS information
Kernel density maps of Tourist Attractions on Openstreetmap of Singapore
Kernel density maps of Airbnb Listings (2019) on Openstreetmap of Singapore
Kernel density maps of Hotels on Openstreetmap of Singapore
Kernel density maps of MRT station on Openstreetmap of Singapore




PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of airbnb are randomly distributed.
H1= The distribution of airbnb are not randomly distributed.
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.
Monte Carlo test with G-fucntion
Generating 40 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40.
Done.

To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of MRT are randomly distributed.
H1= The distribution of MRT are not randomly distributed.
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.
Monte Carlo test with G-fucntion
Generating 40 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40.
Done.

To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of tourist attractions are randomly distributed.
H1= The distribution of tourist attractions are not randomly distributed.
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.
Monte Carlo test with G-fucntion
Generating 40 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40.
Done.

To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of hotels are randomly distributed.
H1= The distribution of hotels are not randomly distributed.
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.
Monte Carlo test with G-fucntion
Generating 40 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40.
Done.

In this section, we will use appropriate tmap function to display the locations of the Airbnb listing by room type, and describe the spatial patterns observed.
Plotting an interactive map to depict the locations of Airbnb listing by room type, with Openstreetmap of Singapore as background. Setting the 4 different room types as different coloured b ubbble, we plot an interactive map to clearly identify which area has the most rooms, and which room type is the most popular.
At first glance, it is very clear that the Central Region has the most airbnb listings and that “entire home/apt” is the most frequent room rented out in the Central Region. However, if we take a look at the whole map, it is interesting to note that majority of the listings are of ‘private room’ category in other areas of Singapore.
The same observation of Central Region having the most airbnb listings and that “entire home/apt” is the most frequent room rented out in the Central Region remain true in 2021.
From the map, we can also observe that there is a ‘hotel room’ category which does not exist in 2019 listing.
A quick glance at the map suggest that overall there seems to be less listings in the year 2021 compared to 2019
We would like to analyse each of the 4 individual room types, to analyse whether they are random or clustered, and if clustered, which are the clustered locations. This will help us in understanding the relationship between location and room type in Singapore.
The following code chunk will create a new dataframe by filtering out the different room types:
Private Room Type (2021)

The graph shows the distribution of Airbnb private room types (2021) across the Singapore map, and we can see significant clustering at the Central area.
Private Room Type (2019)

The graph shows the distribution of Airbnb private room types across the Singapore map, and we can that private housing listing is significantly more dense in 2021 compared to 2019 throughout Singapore.
Shared Room Type (2021)

The graph shows the distribution of Airbnb shared room types (2021) across the Singapore map, and we can see significant clustering at the Central area and some listings in the north.
Shared Room Type (2019)

Unlike private room listings, the shared room listings appears to be more densely clusted in 2019 compared to 2021.
Entire Home/Apartment (2021)

The graph shows the distribution of Airbnb apartment room types (2021) across the Singapore map, and we can see significant clustering at the Central area.
Entire Home/Apartment (2019)

The graph shows the distribution of Airbnb apartment room types across the Singapore map, and we can see that the distribution of such room type is more disperse geographically in Singapore compared to other listings, though it is still clustered in the central area of Singapore.
Hotel Room Type (2021)

Hotel room type is only present in the airbnb21 dataframe and it is clustered around central region of Singapore.
Spatstat requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object. We need to convert the Spatial classes* into Spatial object first.
The codes chunk below converts the Spatial* classes into generic sp objects
(normal) kernel with a 1000m standard deviation (sigma)
Kernel Density Map of Private Room Type


Kernel Density Map of Shared Room Type


Kernel Density Map of Apartment Room Type


Kernel Density Map of Hotel Room Type

Kernel Density Map of Private Room Type


Kernel Density Map of Shared Room Type


Kernel Density Map of Apartment Room Type


Kernel Density Map of Hotel Room Type

The result is the same, we just convert it so that it is suitable for mapping purposes
With reference to the spatial point patterns observed previously, we will attempt to formulate the null hypothesis and alternative hypothesis, and select the confidence level.
Then, we will perform the test by using appropriate 2nd order spatial point patterns analysis technique, before drawing statistical conclusions. We will be using G-Function and L-Function to derive our conclusion. This is because the F-Function is based on distance pairs which is not really relevant to us as compared to G-Function which is based on nearest neighbour distances. Since L-Function is the normalised version of K-Function, it will make more sense to use L-Function instead of K-Function.
Private Room (2019)
G function estimation

Performing Complete Spatial Randomness Test H0: Distribution of private room types in Singapore during 2019 are randomly distributed H1: Distribution of private room types in Singapore during 2019 are not randomly distributed Null hypothesis will be rejected if p values is smaller than alpha value of 0.001
Monte Carlo test with G function (1000 simulations)
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Since the estimated G(r) function lies above the envelope from point 0, the estimated G(r) is statistically significant. Reject null hypothesis that the distribution of private room types in Singapore is randomly distributed.
As the G increases rapidly at the start (short distance), this tells us that the points of private room type in Singapore are clustered. This observation is consistent with the ppp graph that we have previously plotted to visualise.
Private Room (2021)
G function estimation

Performing Complete Spatial Randomness Test H0: Distribution of private room types in Singapore during 2021 are randomly distributed H1: Distribution of private room types in Singapore during 2021 are not randomly distributed Null hypothesis will be rejected if p values is smaller than alpha value of 0.001
Monte Carlo test with G function (1000 simulations)
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Since the estimated G(r) function lies above the envelope from point 0, the estimated G(r) is statistically significant. Reject null hypothesis that the distribution of private room types in Singapore is randomly distributed.
As the G increases rapidly at the start (short distance), this tells us that the points of private room type in Singapore are clustered. This observation is consistent with the ppp graph that we have previously plotted to
Shared Room (2019)
G function estimation

Performing Complete Spatial Randomness Test H0: Distribution of shared room types in Singapore during 2019 are randomly distributed H1: Distribution of shared room types in Singapore during 2019 are not randomly distributed Null hypothesis will be rejected if p values is smaller than alpha value of 0.001
Monte Carlo test with G function (1000 simulations)
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Since the estimated G(r) function lies above the envelope from point 0, the estimated G(r) is statistically significant. Reject null hypothesis that the distribution of private room types in Singapore is randomly distributed.
As the G increases rapidly at the start (short distance), this tells us that the points of shared room type in Singapore are clustered. This observation is consistent with the ppp graph that we have previously plotted to visualise.
Shared Room (2021)
G function estimation

Performing Complete Spatial Randomness Test H0: Distribution of shared room types in Singapore during 2021 are randomly distributed H1: Distribution of shared room types in Singapore during 2021 are not randomly distributed Null hypothesis will be rejected if p values is smaller than alpha value of 0.001
Monte Carlo test with G function (1000 simulations)
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Since the estimated G(r) function lies above the envelope from point 0, the estimated G(r) is statistically significant. Reject null hypothesis that the distribution of shared room types in Singapore is randomly distributed.
As the G increases rapidly at the start (short distance), this tells us that the points of shared room type in Singapore are clustered. This observation is consistent with the ppp graph that we have previously plotted to visualise.
Apartment Room Type (2019)
G function estimation

Performing Complete Spatial Randomness Test H0: Distribution of apartment room types in Singapore during 2019 are randomly distributed H1: Distribution of apartment room types in Singapore during 2019 are not randomly distributed Null hypothesis will be rejected if p values is smaller than alpha value of 0.001
Monte Carlo test with G function (1000 simulations)
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Since the estimated G(r) function lies above the envelope from point 0, the estimated G(r) is statistically significant. Reject null hypothesis that the distribution of apartment room types in Singapore is randomly distributed. apartment room type in Singapore are clustered. This observation is consistent with the ppp graph that we have previously plotted to visualise.
Shared Room (2021)
G function estimation

Performing Complete Spatial Randomness Test H0: Distribution of apartment room types in Singapore during 2021 are randomly distributed H1: Distribution of apartment room types in Singapore during 2021 are not randomly distributed Null hypothesis will be rejected if p values is smaller than alpha value of 0.001
Monte Carlo test with G function (1000 simulations)
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Since the estimated G(r) function lies above the envelope from point 0, the estimated G(r) is statistically significant. Reject null hypothesis that the distribution of apartment room types in Singapore is randomly distributed.
As the G increases rapidly at the start (short distance), this tells us that the points of shared room type in Singapore are clustered. This observation is consistent with the ppp graph that we have previously plotted to visualise.
Hotel Room (2021)
G function estimation

Performing Complete Spatial Randomness Test H0: Distribution of hotel room types in Singapore during 2021 are randomly distributed H1: Distribution of hotel room types in Singapore during 2021 are not randomly distributed Null hypothesis will be rejected if p values is smaller than alpha value of 0.001
Monte Carlo test with G function (1000 simulations)
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Since the estimated G(r) function lies above the envelope from point 0, the estimated G(r) is statistically significant. Reject null hypothesis that the distribution of hotel room types in Singapore is randomly distributed.
As the G increases rapidly at the start (short distance), this tells us that the points of hotel room type in Singapore are clustered. This observation is consistent with the ppp graph that we have previously plotted to visualise.
Private Room Type (2019)
COMPUTING L FUCNTION ESTIMATION

PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of private room in Singapore are randomly distributed.
H1= The distribution of private room in Singapore are not randomly distributed.
The null hypothesis will be rejected if p-value if smaller than alpha value of 0.001.
The code chunk below is used to perform the hypothesis testing.
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Private Room Type (2021)
COMPUTING L FUCNTION ESTIMATION

PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of private room in Singapore are randomly distributed.
H1= The distribution of private room in Singapore are not randomly distributed.
The null hypothesis will be rejected if p-value if smaller than alpha value of 0.001.
The code chunk below is used to perform the hypothesis testing.
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Shared Room Type (2019)
COMPUTING L FUCNTION ESTIMATION

PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of shared room in Singapore are randomly distributed.
H1= The distribution of shared room in Singapore are not randomly distributed.
The null hypothesis will be rejected if p-value if smaller than alpha value of 0.001.
The code chunk below is used to perform the hypothesis testing.
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Shared Room Type (2021)
COMPUTING L FUCNTION ESTIMATION

PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of shared room in Singapore are randomly distributed.
H1= The distribution of shared room in Singapore are not randomly distributed.
The null hypothesis will be rejected if p-value if smaller than alpha value of 0.001.
The code chunk below is used to perform the hypothesis testing.
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Apartment Room Type (2019)
COMPUTING L FUCNTION ESTIMATION

PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of apartment in Singapore are randomly distributed.
H1= The distribution of apartment in Singapore are not randomly distributed.
The null hypothesis will be rejected if p-value if smaller than alpha value of 0.001.
The code chunk below is used to perform the hypothesis testing.
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Apartment Type (2021)
COMPUTING L FUCNTION ESTIMATION

PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of apartment in Singapore are randomly distributed.
H1= The distribution of apartment in Singapore are not randomly distributed.
The null hypothesis will be rejected if p-value if smaller than alpha value of 0.001.
The code chunk below is used to perform the hypothesis testing.
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.

Hotel Room Type (2021)
COMPUTING L FUCNTION ESTIMATION

PERFORMING COMPLETE SPATIAL RANDOMNESS TEST
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis and test are as follows:
Ho = The distribution of hotel listings in Singapore are randomly distributed.
H1= The distribution of hotel listings in Singapore are not randomly distributed.
The null hypothesis will be rejected if p-value if smaller than alpha value of 0.001.
The code chunk below is used to perform the hypothesis testing.
Generating 9 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9.
Done.
